Conversation
Follow-up on apache#5462 to also apply this fix for ChunkedArray. Closes apache#5471 from jorisvandenbossche/ARROW-6652-chunked-array-timezone and squashes the following commits: 89d0044 <Joris Van den Bossche> add helper function 5122451 <Joris Van den Bossche> ARROW-6652: Fix ChunkedArray.to_pandas to retain timezone Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>
Closes apache#5439 from pitrou/ARROW-3777-slow-fs and squashes the following commits: 2ca64c5 <Antoine Pitrou> Try to fix Windows failure b02a8c5 <Antoine Pitrou> ARROW-3777: Add Slow input streams and slow filesystem Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…cal plan This PR adds the binary expression to the new physical execution plan, with support for comparison operators (`<`, `<=`, `>`, `>=`, `==`, `!=`) and boolean operators `AND` and `OR`. Other binary expressions, such as math expressions will be added in a future PR. Closes apache#5478 from andygrove/ARROW-6669 and squashes the following commits: 83bfa77 <Andy Grove> formatting af8d298 <Andy Grove> address PR feedback 9ad3b7f <Andy Grove> formatting bb82a24 <Andy Grove> use expect() instead of unwrap() when downcasting arrays 9b94cc8 <Andy Grove> Implement binary expression with support for comparison and boolean operators Authored-by: Andy Grove <andygrove73@gmail.com> Signed-off-by: Paddy Horan <paddyhoran@hotmail.com>
… to Parquet https://issues.apache.org/jira/browse/ARROW-6187 Closes apache#5436 from jorisvandenbossche/ARROW-6187-parquet-extension-type and squashes the following commits: e56164b <Joris Van den Bossche> expose constants in extension_type.h 61d245e <Joris Van den Bossche> clean-up chunked array creation 6b2f190 <Joris Van den Bossche> recreate extension type on read bdda0f7 <Joris Van den Bossche> test that extension metadata is already saved abf2a2f <Joris Van den Bossche> add python test fb4b810 <Joris Van den Bossche> ARROW-6187: Fallback to storage type when writing ExtensionType to Parquet Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Also adds and improves other C++ doc items. Closes apache#5487 from pitrou/ARROW-6629-cpp-fs-docs and squashes the following commits: d47a008 <Antoine Pitrou> Update docs/source/cpp/io.rst 895f04a <Antoine Pitrou> Try to fix Sphinx error on Travis c40e0e2 <Antoine Pitrou> ARROW-6629: Add filesystem docs Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
…atch See the readme and the new tests for example output. This patch also fixes a validation bug in `dictionary()`, aliases that to `DictionaryType$create`, and adds default arguments. Closes apache#5492 from nealrichardson/print-methods and squashes the following commits: c092d30 <Neal Richardson> Merge branch 'print-methods' of github.com:nealrichardson/arrow into print-methods 02afb89 <Neal Richardson> Prettier printing of dictionary type's ordered attribute 5750100 <Neal Richardson> indices in the docs too 6be0328 <Neal Richardson> indices 2d4e744 <Neal Richardson> Add/improve print methods for Array, ChunkedArray, Table, RecordBatch Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
https://issues.apache.org/jira/browse/ARROW-6674 Closes apache#5489 from jorisvandenbossche/ARROW-6674-test-warnings and squashes the following commits: 2a2bb14 <Joris Van den Bossche> ARROW-6674: Fix or ignore the test warnings Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>
…of StructArray https://issues.apache.org/jira/browse/ARROW-6158 Closes apache#5488 from jorisvandenbossche/ARROW-6158-struct-array-validation and squashes the following commits: 7573781 <Joris Van den Bossche> ARROW-6158: Validate child array types with type fields of StructArray Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>
…t be base64-encoded to be UTF-8 compliant I have added a simple base64 implementation (Zlib license) to arrow/vendored from https://github.com/ReneNyffenegger/cpp-base64 Closes apache#5493 from wesm/ARROW-6678 and squashes the following commits: c058e86 <Wes McKinney> Simplify, add MSVC exports 06f75cd <Wes McKinney> Fix Python unit test that needs to base64-decode now eabb121 <Wes McKinney> Fix LICENSE.txt, add iwyu export b3a584a <Wes McKinney> Add vendored base64 C++ implementation and ensure that Thrift KeyValue in Parquet metadata is UTF-8 Authored-by: Wes McKinney <wesm+git@apache.org> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
…n" operator This PR implements the physical execution plan for the selection operator (the WHERE clause in a SQL query). In order to have working tests, I also had to implement some subset of expressions (column reference, literal value, comparison expressions, and CAST). However, the goal of this PR is not to add complete support for all expressions but to implement the Selection operator. I will create separate JIRA/PRs for adding support for other expressions and data types in the physical query plan. Closes apache#5320 from andygrove/ARROW-6089 and squashes the following commits: 6cad327 <Andy Grove> Implement selection operator Authored-by: Andy Grove <andygrove73@gmail.com> Signed-off-by: Andy Grove <andygrove73@gmail.com>
…quet _build_nested_path has a reference cycle because the closured function refers to the parent cell which also refers to the closured function again. Address this by clearing the reference to the function from the parent cell before returning. open_dataset_file is partialed with self inside the ParquetFile class. Prevent this by using a weakref instead. Closes apache#5476 from AaronOpfer/master and squashes the following commits: f4909e0 <Wes McKinney> Fix flakes 883ab86 <Aaron Opfer> ARROW-6667: remove cyclical object references in pyarrow.parquet Lead-authored-by: Aaron Opfer <aaron.opfer@chicagotrading.com> Co-authored-by: Wes McKinney <wesm+git@apache.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
- Adds StatusDetail to the docs - Fixes the Doxygen config to work with `ARROW_FLIGHT_EXPORT` - Adds error codes to the format docs (though they're not part of the formal format) - Touches up some docstrings and adds missing classes to the API docs - Add a basic description of how to set up a Flight server/client in C++ Closes apache#5491 from lihalite/flight-docs and squashes the following commits: 2948983 <David Li> ARROW-6677: Document Flight in C++ Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…taframe - add scanReverse() to dataFrame and filteredDataframe - add tests for scanReverse() Closes apache#5480 from mmaclach/master and squashes the following commits: 01faae8 <mmaclach> JS: scanReverse Authored-by: mmaclach <mmaclachlan@ccri.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
I fixed this in R, though I wonder if that's not enough. At a minimum, the C++ docs should note this requirement (@pitrou maybe you know the best place in the docs to add this?), and I still think it would be nice if all of this path normalization were handled in C++ (cf. https://issues.apache.org/jira/browse/ARROW-6324). Closes apache#5445 from nealrichardson/win-fs-fix and squashes the following commits: 515d710 <Neal Richardson> Rename functions 1775a6d <Neal Richardson> Fix two other absolute paths I missed 0dce0d7 <Neal Richardson> Munge paths directly on windows f03815e <Neal Richardson> Normalize paths for filesystem API on Windows Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
…ata sources I discovered this last minute while running manual tests. I have been able to run parallel queries against parquet files using this branch as a dependency. Closes apache#5494 from andygrove/ARROW-6086 and squashes the following commits: 77bee15 <Andy Grove> Replace unwrap with Result combinator cd11b97 <Andy Grove> don't panic c751753 <Andy Grove> Add support for partitioned parquet data sources 25eaf45 <Andy Grove> Move build_file_list into common module Authored-by: Andy Grove <andygrove73@gmail.com> Signed-off-by: Andy Grove <andygrove73@gmail.com>
cf. https://github.com/jeroen/autobrew/blob/gh-pages/LICENCE.txt Closes apache#5501 from nealrichardson/autobrew-license and squashes the following commits: 3e790f5 <Neal Richardson> MIT license for autobrew Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>
Closes apache#5500 from pitrou/ARROW-6630-file-format-docs and squashes the following commits: aa5c57d <Antoine Pitrou> ARROW-6630: Document C++ file formats Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Related to [ARROW-6472](https://issues.apache.org/jira/browse/ARROW-6472). If we use visitor API this way: >RangeEqualsVisitor visitor = new RangeEqualsVisitor(vector1, vector2); vector3.accept(visitor, range) if vector1/vector2 are say, StructVector}}s and vector3 is an {{IntVector - things can go bad. we'll use the compareBaseFixedWidthVectors() and do wrong type-casts for vector1/vector2. Discussions see: apache#5195 (comment) https://issues.apache.org/jira/browse/ARROW-6472 Closes apache#5483 from tianchen92/ARROW-6472 and squashes the following commits: 3d3d295 <tianchen> add test 12e4aa2 <tianchen> ARROW-6472: ValueVector#accept may has potential cast exception Authored-by: tianchen <niki.lj@alibaba-inc.com> Signed-off-by: Pindikura Ravindra <ravindra@dremio.com>
Initial support for array reader. List and map support will come later. Closes apache#5378 from liurenjie1024/arrow-4218 and squashes the following commits: 433abab <Renjie Liu> Remove unwraps with result 6407dee <Renjie Liu> Fix format 4dd9a01 <Renjie Liu> Initial support for array reader 215f73b <Renjie Liu> struct array reader 2e898ff <Renjie Liu> test done for primitive array reader Authored-by: Renjie Liu <liurenjie2008@gmail.com> Signed-off-by: Andy Grove <andygrove73@gmail.com>
Closes apache#5503 from andygrove/ARROW-6687 and squashes the following commits: c0ded56 <Andy Grove> Bug fix in DataFusion Parquet reader Authored-by: Andy Grove <andygrove73@gmail.com> Signed-off-by: Andy Grove <andygrove73@gmail.com>
Closes apache#5499 from alippai/patch-1 and squashes the following commits: d2fd03e <Ádám Lippai> Update README.md Authored-by: Ádám Lippai <adam@rigo.sk> Signed-off-by: Andy Grove <andygrove73@gmail.com>
…able This is needed to use correct download URL for RC. Closes apache#5506 from kou/packaging-linux-restore-arrow-version and squashes the following commits: 7face8a <Sutou Kouhei> Restore ARROW_VERSION environment variable Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
Related to [ARROW-6709](https://issues.apache.org/jira/browse/ARROW-6709). Currently, in several consumers, currentIndex only increments when ResultSet was not null. However, if ResultSet contains null values, the Arrow vector valueCount is not correct. Closes apache#5511 from tianchen92/ARROW-6709 and squashes the following commits: b1e9d5a <tianchen> ARROW-6709: Jdbc adapter currentIndex should increment when value is null Authored-by: tianchen <niki.lj@alibaba-inc.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
Construct tree structure from std::vector<fs::FileStats>, following the path directory hierarchy. Closes apache#5430 from fsaintjacques/ARROW-6606-path-tree and squashes the following commits: 43d19fa <François Saint-Jacques> Address comments 60b5945 <François Saint-Jacques> Simplify implementation 109ea85 <François Saint-Jacques> ARROW-6606: Add PathTree tree structure Authored-by: François Saint-Jacques <fsaintjacques@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
…lity https://issues.apache.org/jira/browse/ARROW-6683 @wesm is this more or less what you were thinking? Closes apache#5498 from jorisvandenbossche/ARROW-6683-fastparquet-cross-testing and squashes the following commits: 4c1e3aa <Joris Van den Bossche> add comment 1d35f3b <Joris Van den Bossche> add fastparquet mark c5d6161 <Joris Van den Bossche> ARROW-6683: Test for fastparquet <-> pyarrow cross-compatibility Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
…ntegrition tests Arrow Java Writer now requires an IpcOption for some APIs, this patch fixes the compilation to run Spark Integration tests. Closes apache#5465 from BryanCutler/spark-integration-patch-ARROW-6429 and squashes the following commits: 918ab91 <Bryan Cutler> Remove redundant message 655937d <Bryan Cutler> Changes to Spark integration test config dd2483f <Bryan Cutler> Add patch to rat excludes 48b2eac <Bryan Cutler> Adding patch to fix Spark compilation with IpcOption Authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
Rust 1.40.0-nightly just got released and caused builds to start failing Closes apache#5519 from andygrove/ARROW-6716 and squashes the following commits: 5301d7b <Andy Grove> trigger rebuild b651d7b <Andy Grove> Use 1.40.0-nightly-2019-09-25 Authored-by: Andy Grove <andygrove73@gmail.com> Signed-off-by: Paddy Horan <paddyhoran@hotmail.com>
…row specific) This adds parameters to `write_parquet()` to control compression, whether to use dictionary, etc ... on top of the C++ classes `parquet::WriterProperties` and `parquet::ArrowWriterProperties` e.g. ```r write_parquet(tab, file, compression = "gzip", compression_level = 7) ``` Closes apache#5451 from romainfrancois/ARROW-6532/write_parquet_compression and squashes the following commits: 413dd41 <Romain Francois> test make_valid_version() 50555f8 <Romain Francois> rename arguments to `x` and `sink` 9aff79b <Romain Francois> implement ==.Object that calls $Equals instead of implementing for each class. ecd9218 <Romain Francois> rework documentation for write_parquet() 56dac33 <Romain Francois> Move read_parquet() and write_parquet() to top of the file 45ec63b <Romain François> Update r/R/parquet.R 66c51fd <Romain Francois> added all.equal.Object() that uses == c5549de <Romain Francois> Test ==.Table 5ade52d <Romain Francois> wrong length for use_dictionary and write_statistics 00cc214 <Romain Francois> abstract various ParquetWriterPropertiesBuilder$set_*() methods 1fdcc0b <Romain Francois> suggestsions from @nealrichardson 9bee8de <Romain Francois> define and use internal make_valid_version() function 004cf90 <Romain Francois> M%ake compression_from_name() vectorized 86d9ff4 <Romain Francois> Remove the _ from builder classes 6c4f003 <Romain Francois> add test helper so that we actually can test parquet roundtrip d318a66 <Romain Francois> ==.Table 7f1c184 <Romain Francois> align arguments following tidyverse style guide 72caaab <Romain Francois> using assert_that() 738ea6e <Romain Francois> Remove $default() methods and use $create() wityh default arguments instead. 1166264 <Romain Francois> using make_valid_time_unit() 4055f67 <Romain Francois> More flexible arguments use_dictionary= and write_statistics= 2f2ae00 <Romain Francois> More flexible compression= and compression_level= 1e3b5b6 <Romain Francois> document() 2dd2cb9 <Romain Francois> + compression_level= in write_parquet() b8337e1 <Romain Francois> lint fa8990b <Romain Francois> Expose options from ParquetWriterProperties and ParquetArrowWriterProperties to write_parquet() 09ea0ad <Romain Francois> + ParquetWriterProperties$create() and associated ParquetWriterProperties_Builder class skeleton 1b84ad4 <Romain Francois> Exposing classes parquet::arrow::ArrowWriterProperties and parquet::arrow::WriterProperties to R side 0e09ac8 <Romain Francois> lint aa34095 <Romain Francois> passing down the right stream 9ed32b6 <Romain Francois> Make write_parquet() generic, internal impl using streams rather than file path for more flexibility Lead-authored-by: Romain Francois <romain@rstudio.com> Co-authored-by: Romain François <romain@purrple.cat> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Closes apache#5514 from nealrichardson/fix-lint and squashes the following commits: ea8b8e7 <Neal Richardson> Note that clang-format-7 is required f83f0e1 <Neal Richardson> Incorporate @kou's suggestion 6bcc063 <Neal Richardson> Note how to use lint.sh and have it look for clang-format rather than hard-code its location Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
I noticed this now-defunct call to `table()` while reviewing another PR. We clearly weren't testing this case because if you were to pass a data.frame in, you'd get a segfault. This patch adds tests and fixes the issue. Closes apache#5518 from nealrichardson/record-batch-writer-fix and squashes the following commits: afad9fe <Neal Richardson> Fix untested RecordBatchWriter case Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
079d41a to
9d68c44
Compare
kszucs
pushed a commit
that referenced
this pull request
Feb 24, 2020
…comments. The reset method allow the data structures to be re-used so they don't have to be allocated over and over again. Closes apache#6430 from richardartoul/ra/merge-upstream and squashes the following commits: 5a08281 <Richard Artoul> Add license to test file d76be05 <Richard Artoul> Add test for data reset d102b1f <Richard Artoul> Add tests d3e6e67 <Richard Artoul> cleanup comments c8525ae <Richard Artoul> Add Reset method to int array (#5) 489ca25 <Richard Artoul> Fix array.setData() to retain before release (#4) 88cd05f <Richard Artoul> Add reset method to Data (#3) 6d1b277 <Richard Artoul> Add Reset() method to String array (#2) dca2303 <Richard Artoul> Add Reset method to buffer and cleanup comments (#1) Lead-authored-by: Richard Artoul <richard.artoul@datadoghq.com> Co-authored-by: Richard Artoul <richardartoul@gmail.com> Signed-off-by: Sebastien Binet <binet@cern.ch>
kszucs
pushed a commit
that referenced
this pull request
May 11, 2020
This PR enables tests for `ARROW_COMPUTE`, `ARROW_DATASET`, `ARROW_FILESYSTEM`, `ARROW_HDFS`, `ARROW_ORC`, and `ARROW_IPC` (default on). apache#7131 enabled a minimal set of tests as a starting point. I confirmed that these tests pass locally with the current master. In the current TravisCI environment, we cannot see this result due to a lot of error messages in `arrow-utility-test`. ``` $ git log | head -1 commit ed5f534 % ctest ... Start 1: arrow-array-test 1/51 Test #1: arrow-array-test ..................... Passed 4.62 sec Start 2: arrow-buffer-test 2/51 Test #2: arrow-buffer-test .................... Passed 0.14 sec Start 3: arrow-extension-type-test 3/51 Test #3: arrow-extension-type-test ............ Passed 0.12 sec Start 4: arrow-misc-test 4/51 Test #4: arrow-misc-test ...................... Passed 0.14 sec Start 5: arrow-public-api-test 5/51 Test #5: arrow-public-api-test ................ Passed 0.12 sec Start 6: arrow-scalar-test 6/51 Test #6: arrow-scalar-test .................... Passed 0.13 sec Start 7: arrow-type-test 7/51 Test #7: arrow-type-test ...................... Passed 0.14 sec Start 8: arrow-table-test 8/51 Test #8: arrow-table-test ..................... Passed 0.13 sec Start 9: arrow-tensor-test 9/51 Test #9: arrow-tensor-test .................... Passed 0.13 sec Start 10: arrow-sparse-tensor-test 10/51 Test #10: arrow-sparse-tensor-test ............. Passed 0.16 sec Start 11: arrow-stl-test 11/51 Test #11: arrow-stl-test ....................... Passed 0.12 sec Start 12: arrow-concatenate-test 12/51 Test #12: arrow-concatenate-test ............... Passed 0.53 sec Start 13: arrow-diff-test 13/51 Test #13: arrow-diff-test ...................... Passed 1.45 sec Start 14: arrow-c-bridge-test 14/51 Test #14: arrow-c-bridge-test .................. Passed 0.18 sec Start 15: arrow-io-buffered-test 15/51 Test #15: arrow-io-buffered-test ............... Passed 0.20 sec Start 16: arrow-io-compressed-test 16/51 Test #16: arrow-io-compressed-test ............. Passed 3.48 sec Start 17: arrow-io-file-test 17/51 Test #17: arrow-io-file-test ................... Passed 0.74 sec Start 18: arrow-io-hdfs-test 18/51 Test #18: arrow-io-hdfs-test ................... Passed 0.12 sec Start 19: arrow-io-memory-test 19/51 Test #19: arrow-io-memory-test ................. Passed 2.77 sec Start 20: arrow-utility-test 20/51 Test apache#20: arrow-utility-test ...................***Failed 5.65 sec Start 21: arrow-threading-utility-test 21/51 Test apache#21: arrow-threading-utility-test ......... Passed 1.34 sec Start 22: arrow-compute-compute-test 22/51 Test apache#22: arrow-compute-compute-test ........... Passed 0.13 sec Start 23: arrow-compute-boolean-test 23/51 Test apache#23: arrow-compute-boolean-test ........... Passed 0.15 sec Start 24: arrow-compute-cast-test 24/51 Test apache#24: arrow-compute-cast-test .............. Passed 0.22 sec Start 25: arrow-compute-hash-test 25/51 Test apache#25: arrow-compute-hash-test .............. Passed 2.61 sec Start 26: arrow-compute-isin-test 26/51 Test apache#26: arrow-compute-isin-test .............. Passed 0.81 sec Start 27: arrow-compute-match-test 27/51 Test apache#27: arrow-compute-match-test ............. Passed 0.40 sec Start 28: arrow-compute-sort-to-indices-test 28/51 Test apache#28: arrow-compute-sort-to-indices-test ... Passed 3.33 sec Start 29: arrow-compute-nth-to-indices-test 29/51 Test apache#29: arrow-compute-nth-to-indices-test .... Passed 1.51 sec Start 30: arrow-compute-util-internal-test 30/51 Test apache#30: arrow-compute-util-internal-test ..... Passed 0.13 sec Start 31: arrow-compute-add-test 31/51 Test apache#31: arrow-compute-add-test ............... Passed 0.12 sec Start 32: arrow-compute-aggregate-test 32/51 Test apache#32: arrow-compute-aggregate-test ......... Passed 14.70 sec Start 33: arrow-compute-compare-test 33/51 Test apache#33: arrow-compute-compare-test ........... Passed 7.96 sec Start 34: arrow-compute-take-test 34/51 Test apache#34: arrow-compute-take-test .............. Passed 4.80 sec Start 35: arrow-compute-filter-test 35/51 Test apache#35: arrow-compute-filter-test ............ Passed 8.23 sec Start 36: arrow-dataset-dataset-test 36/51 Test apache#36: arrow-dataset-dataset-test ........... Passed 0.25 sec Start 37: arrow-dataset-discovery-test 37/51 Test apache#37: arrow-dataset-discovery-test ......... Passed 0.13 sec Start 38: arrow-dataset-file-ipc-test 38/51 Test apache#38: arrow-dataset-file-ipc-test .......... Passed 0.21 sec Start 39: arrow-dataset-file-test 39/51 Test apache#39: arrow-dataset-file-test .............. Passed 0.12 sec Start 40: arrow-dataset-filter-test 40/51 Test apache#40: arrow-dataset-filter-test ............ Passed 0.16 sec Start 41: arrow-dataset-partition-test 41/51 Test apache#41: arrow-dataset-partition-test ......... Passed 0.13 sec Start 42: arrow-dataset-scanner-test 42/51 Test apache#42: arrow-dataset-scanner-test ........... Passed 0.20 sec Start 43: arrow-filesystem-test 43/51 Test apache#43: arrow-filesystem-test ................ Passed 1.62 sec Start 44: arrow-hdfs-test 44/51 Test apache#44: arrow-hdfs-test ...................... Passed 0.13 sec Start 45: arrow-feather-test 45/51 Test apache#45: arrow-feather-test ................... Passed 0.91 sec Start 46: arrow-ipc-read-write-test 46/51 Test apache#46: arrow-ipc-read-write-test ............ Passed 5.77 sec Start 47: arrow-ipc-json-simple-test 47/51 Test apache#47: arrow-ipc-json-simple-test ........... Passed 0.16 sec Start 48: arrow-ipc-json-test 48/51 Test apache#48: arrow-ipc-json-test .................. Passed 0.27 sec Start 49: arrow-json-integration-test 49/51 Test apache#49: arrow-json-integration-test .......... Passed 0.13 sec Start 50: arrow-json-test 50/51 Test apache#50: arrow-json-test ...................... Passed 0.26 sec Start 51: arrow-orc-adapter-test 51/51 Test apache#51: arrow-orc-adapter-test ............... Passed 1.92 sec 98% tests passed, 1 tests failed out of 51 Label Time Summary: arrow-tests = 27.38 sec (27 tests) arrow_compute = 45.11 sec (14 tests) arrow_dataset = 1.21 sec (7 tests) arrow_ipc = 6.20 sec (3 tests) unittest = 79.91 sec (51 tests) Total Test time (real) = 79.99 sec The following tests FAILED: 20 - arrow-utility-test (Failed) Errors while running CTest ``` Closes apache#7142 from kiszk/ARROW-8754 Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
kszucs
pushed a commit
that referenced
this pull request
Apr 7, 2021
From a deadlocked run... ``` #0 0x00007f8a5d48dccd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f8a5d486f05 in pthread_mutex_lock () from /lib64/libpthread.so.0 #2 0x00007f8a566e7e89 in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #3 0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #4 0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #5 0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #6 0x00007f8a566e827d in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #7 0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #8 0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #9 0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #10 0x00007f8a566e74b1 in arrow::fs::(anonymous namespace)::TreeWalker::DoWalk() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so ``` The callback `ListObjectsV2Handler` is being called recursively and the mutex is non-reentrant thus deadlock. To fix it I got rid of the mutex on `TreeWalker` by using `arrow::util::internal::TaskGroup` instead of manually tracking the #/status of in-flight requests. Closes apache#9842 from westonpace/bugfix/arrow-12040 Lead-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.